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Speed, robustness and static performance of TSPC (True Single 
Phase Clocking) latches and fiipflops are analysed in this paper. 
New latches and fiipflops are proposed to upgrade the overall speed, 
power saving, clock slope insensiuvity and static performance of 
TSPC. Both new single -rail and new dual-rail latches and fiipflops are 
proposed. Among them are different dynamic, semi-static and fully- 
static versions. The delays are reduced by factors of 1.3, 2.1, 2.2 and 




2.4 for the single-rail dynamic, the dual-rail dynamic, the semi-static and the fully-static versions respectively. In the same time, power 
consumptions are also reduced so the power-delay products are reduced by factors of 1.9, 3.5, 3.4 and 63 respectively for an average activity 
rate (0.25). These improvements are accompanied with less transistor counts and less clock loads. One unique type of the proposed latches 
uses only a single clocked transistor and only n-transistors in logic (in both n- and p-latches and in both dynamic and static versions). 
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TSPC Latches and Fliplops 

L Introduction 

TSPC strategy has been widely accepted as a high speed CMOS circuit technique. It 
has obvious advantages such as simple clock generation and distribution, small number of 
clocked transistors and high speed [1, 2]. However, some aspects still need to be 
upgraded. In the following, we point out four of them. First, in a high throughput TSPC 
pipeline structure formed by cascading p- and n-blocks alternatively, p-blocks are the 
speed bottlenecks. In order to gain a maximal throughput, one often arranges all logic 
operations into n-blocks and leave p-blocks with no logic operation at alL Even so, when 
complementary signals are needed, extra inverters have to be placed after the already- 
slow p-blocks, which limits the maximal throughput. To increase the speed of p-block, in 
a large extent, is significant for improving overall throughput of the pipeline. 

Second, TSPC is not a non-overlapping clocking system and, consequently, there is 
an up-limit of clock slope length beyond which logic gates become unreliable. The up- 
limit slope length depends on the process parameters and the gate complexity. The value 
has been reducing rapidly and could be less than Ins for improperly sized circuits in sub- 
micron CMOS technologies. In order to keep the slope length short, the clock buffer 
becomes larger and larger. In certain cases, a short slope is simply required by the up- 
limit constraints rather than the speed, which wastes power and chip area due to the huge 
clock buffer. Therefore, to expand the up-limit of clock slope length is significant 

Third, TSPC strategy is a dynamic circuit technique which has a low-limit of 
working frequency. In certain applications, e.g. in a long counter, toggle frequencies may 
become very low and circuits become unreliable. In a noisy environment or charge- 
sharing case, static feature becomes favorable. In order to reduce power consumption, 
part of the circuit may need to stay idle temporarily. Therefore, static performance is one 
of the important robustness issues, which, if possible, should be improved. 
' Finally, power consumption is critical for a heavily pipelined circuit due to too many 
clocked transistors and precharged nodes. High performance circuits should be evaluated 
not only by its short delay but also by its small power-delay product. For this purpose, to 
reduce clocked transistor, precharged nodes as well as the total transistor count in a 
flip flop is important. 
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The intention of this paper is to propose new circuit solutions to meet these demands 
based on a better understanding through analyses and simulations. All simulations in this 
paper are done by using HSPICE and typical parameters of a 0.8jim CMOS process [3]. 
The speed bottleneck of a TSPC pipeline is discussed in section II and the clock slope 
sensitivity is analysed in section HI. A single-stage TSPC fiill-latch and its speed and 
power advantages are presented in section IV while a fast and robust TSPC double 
pipeline using the full-latch is proposed in section V. Dual-rail latches are discussed in 
section VI where completely ratio-insensitive cross-coupled latches and fast flipflop 
arrangements are suggested. In section VTI, dual-rail latches clocked by a single 
transistor and using n-transistor-only logic for both n- and p-latches are proposed. Static 
TSPC flipflop s are described in section VTEI where the semi-static and fiilly-static 
versions of the previous proposed latches are introduced. Performance comparisons are 
shown in section DC while conclusions are given in section X. 



Drawings: 

The invention will below be described by way of examples depicted in the drawings. 

Fig. 1 Four basic stages in TSPC. 

Fig. 2 D and D* generated from p-blocks. 

Fig. 3 D and D* generated from SP-stages. 

Fig. 4 One-direction latches. 

Fig. 5 Instant latching conditions. 

Fig. 6 Worst cases for SN-SN and SP-SP latches. 

Fig. 7 Possible latching failure of TSPC- 1 p recharged latches. 

Fig. 8 Low failure risk by using TSPC-2 type precharged latches. 

Fig. 9 Worst case of SP-SP- SN-SN and of (PP)-SPSN-SN. 

Fig. 10 Worst case of SN-SN-SP-SP and of (PN)-SN-SP-SP. 

Fig. 1 1 A single-stage TSPC fiill-latch. 

Fig. 12 Previously reported method to stabilize intermediate-nodes. 

Fig. 13 Proposed stabilizing method making the latched three-state. 

Fig. 14 The full- latch used after the modified n-latch. 

Fig. 15 TSPC split-latches. 

Fig. 16 The full-latch used after a modified n- split- latch. 

Fig. 17 The single-stage TSPC fiill-latch made from an SN-stage. 

Fig. 18 Speed-up critical stages in a double pipeline. 

Fig. 19 A fast and robust double-pipeline. 
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Fig. 20 CVSL-type latches. 

Fig. 2 1 Cross-coupled latches. 

Fig. 22 Delays versus ratio for CVSL-type latches. 

Fig. 23 Delay comparison between CVSL and cross latches. 

Fig. 24 Combination of CVSL n- and cross p-latches. 

Fig. 25 Fast flipflop arrangement. 

Fig. 26 Combinations of TSPC CN-stages and cross p-latches. 

Fig. 27 STC-1 latches. 

Fig. 28 Fast flipflop using STC-1 n-latch. 

Fig. 29 Dynamic STCL flipflops. 

Fig. 30 Semi-static flipflops. 

Fig. 3 1 Fully-static flipflop constructed by RAM-type latches. 

Fig. 32 Fully-static flipflop constructed by static STC-1 latches. 

Fig. 33 Fully static flipflop constructed by static STC-1 n- and static cross p-latches. 

Fig. 34 Semi- static fast flipflops. 

Fig. 35 Semi-static flipflops requiring idle-high clock. 

Fig. 36 Fully- static STCL flipflops. 

Fig. 37 Semi-static STCL flipflops. 

Fig. 38 Power-delay products of group 1. 

Fig. 39 Power-delay products of group 2. 

Fig. 40 Power-delay products of group 3. 

Fig. 4 1 Power-delay products of group 4. 



tL The speed bottleneck of a tspc pipeline 



There are four basic stages in TSPC: precharged p- and n-stages and non- 
precharged (static) p- and n-stages, named PP, PN, SP and SN stages, shown in Fig.l. 
These are the simplest stages which can be used to form latches and flipflops. A positive 
edge-triggered flipflop can be formed, in its precharged version, by a combination of PP- 
SP-PN-SN or, in hs non-precharged version, by a combination of SP-SP-SN-SN. We can 
call the first two stages as a p-block (or p-latch) and the second two stages as an n-block 
(or n-latch). A negative edge-triggered flipflop can be formed by exchanging the p- and 
n-blocks. Logic operations can be included in the flipflops as long as obeying the 
following rules: in stages PP or PN, logic parts are placed between two clocked 
transistors with single-type transistors (p or n) and in stages SP or SN, logic parts are 
placed in the both ends with complementary-type transistors [2]. A pipeline can be 
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formed by alternatively placing the p- and n-blocks with logic included or not included. 
From the viewpoint of high throughput, we prefer to arrange all logic operations only in 
n-blocks and leave p-blocks as half-clock-cycle delay elements. When complementary 
inputs to n-blocks are needed, we have to generate them through p-blocks. Fig. 2 shows 
complementary outputs from (a) a precharged p-block and (b) a non-precharged p-block. 
The p-block in (a) or (b), therefore, gives a total delay of three stages which becomes a 
speed bottleneck. 

In this case, there is another alternative which uses precharged n-blocks and non- 
precharged p-blocks containing only a single SP stage [2]. This alternative has several 
advantages. First, The p-block has only one-stage delay when a single output is enough. 
Second, the clock load is reduced due to the small number of clocked transistors. Third, 
compared with a fully precharged pipeline, it has only half the number of precharged 
nodes and 75% of stages, which leads to less power consumption. Finally, since the p- 
block has only a single stage and the loads to this stage are only n-transistors, its size can 
be small, giving a speed advantage to the previous n-block. However, there are two 
constraints. First, It works only when the succeeding n-block is precharged and the 
evaluation delay of its PN stage must be less than the evaluation delay of the previous n- 
block plus the pull-down delay of the SP stage. The delay condition is usually satisfied 
but there is a risk when the succeeding PN stage containing a heavy logic calculation. 
Second, when complementary outputs are needed, not only an extra inverter but also an 
extra SP stage have to be added, see Fig.3, since the inverter can not be placed between 
the SP stage and the succeeding precharged n-block. 

HL clock slope sensitivety 

Clock slope sensitivity is an important issue for TSPC as well as for all overlapping 
clocking systems. The issue has been discussed in literature [4]. We will focus our 
discussions on the robust latching conditions and on the worst case analyses for typical 
TSPC circuits. When we discuss the slope sensitivity in the following, we assume that the 
input of a latch has been already established before the clock latching-edge starts so we 
only need to consider the hold time of the input. 

Failure of a single TSPC stage: Among four basic TSPC stages, the SP and SN 
stages are one-direction latches while the PP and PN stages are precharged stages. The 
output of a SP stage can be latched only when it is low and the output of a SN stage can 
be latched only when it is high, see Fig. 4. Let us first look at the SN stage in which a 
high-output will be latched during a low clock phase. One observation is that during a 
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high clock phase when node a (the output node) is charged to Vdd, node b will be 
charged to (Vdd-V'ruh) so the clocked n-transistor will be in the edge of hs off-state, 
where V nt h is the source-to-substrate vokage dependent threshold voltage of the 
clocked n-transistor. Assuming that the low-input is stable, the output will be latched 
"instantly" when the clock starts its high-to-low slope, which is indicated in Fig. 5 by the 
"instant" latch point. If the input is not stable and changes from low to high, node b will 
be discharged from (Vdd-Vmh) to ground. During the discharging, as long as Vgs of the 
clocked n-transistor (represented by V ngs ) is less than Vnth the "instant" latching is still 
valid. A difference more than V nt h will cause a charge leakage on the output node. 
Although a small leakage does not mean a 100% latching failure, we can see the "instant" 
latching, i.e. V ngs 2 V nt h, as a robust latching condition. It is the same for a SP stage 
except the voltages are just opposite, which is also shown in Fig. 5. In a SP stage, a low- 
output will be latched during a high clock phase and node d will be discharged to | V ptn | 
which is the source-to-substrate voltage dependent threshold voltage of the clocked p- 
transistor. The robust latching condition (the "instant" latching) requires IVpgsHVpthl. 

For a single precharged stage (PP or PN), there is no latching failure as it is not a 
latch stage. The only possible failure is that if the precharged node has not been 
discharged previously, it could be discharged a little due to charge sharing caused by a 
fast new input and a slow clock. Investigation indicates that such a risk is very small, 
which will not be discussed further in the following. However, the precharged node 
signal will cause a latching failure of its succedent stage in a precharged latch, which will 
be discussed below. 

Failure of a TSPC latch: In non-precharged TSPC latches (SP-SP or SN-SN), the 
actual latching stage can be either the first or the second, depending on the input state. 
When a latch is latching, the worst case happens if the first stage is the actual latching 
stage since it is closer to the input. Fig. 6(a) shows the worst latching cases for a SN-SN 
latch and a SP-SP latch respectively. Just opposite, when a latch is unlatching, the fastest 
output transition happens if the second stage is the actual latching stage, which creates a 
worst case for its succedent latch. Fig. 6(b) shows the worst unlatching cases for a SN- 
SN latch and a SP-SP latch respectively. For non-precharged TSPC latches, a latching 
failure occurs only when the input is unstable during latching, which implies that latching 
failures only occur between two latches. For precharged TSPC latches, however, even if 
the input is stable a latching failure could occur internally as the cases shown in Fig. 7. 
The position of the second stage (SN or SP) in a precharged latch is very similar to that 
of the first stage in a non-precharged latch in Fig. 6(a) so a latching failure could occur. 
A failure can be defined for a precharged latch when a high-output of an n-latch becomes 
|Vpth| bel w Vdd or a low-output of a p-latch becomes V nt h above ground. In these 
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cases, the p- or n-transistor in the next precharged latch will have leakage current. As a 
result, if the failures are not serious enough, the chip may work at a high frequency but 
not at a low frequency, which we have observed. According to simulations, for unsized 
precharged p- and n-latches in a 0.8fim CMOS process with typical parameters [3], the 
maximum clock slops are 3.6ns and 6.1ns respectively before failures occur. The reason 
for the p-latch to be worse is that it is precharged by an n-transistor rather than a p- 
transistor like that in an n-latch. The figures are dramaticly reduced to 1.1ns and 1.2ns 
respectively when the sizes of transistors marked by dots are increased by a factor of 3. If 
the transistor parameter deviates from the typical values, the figures could be even lower. 
To avoid such internal latching failures, TSPC-2 type [2] precharged latches are 
recommended which are redrawn in Fig. 8 and will be used later. 

Failure between two TSPC latches: We will now only discuss the cases between 
two non-precharged latches and from a precharged latch to a non-precharged latch since 
other cases have been already covered above. The worst cases occur when the first latch 
is in the worst unlatching state and the second latch is in the worst latching state, which 
are shown in Figs. 9 and 10 respectively for a P-N and an N-P combinations. Actually, 
only the pair of stages in the dashline boxes are relevant. The two figures cover both the 
case of two non-precharged latches and the case of a precharged latch to a non- 
precharged latch because the relevant circuit parts are identical. The same as above, a 
failure can be defined when a high-output from the dashline box of Fig. 9 becomes |Vpthl 
below Vdd or a low-output from the dash-line box of Fig. 10 becomes V nt h above 
ground. This is because the last SN stage in Fig. 9 and the last SP stage in Fig. 10 can 
only latch a low-output and a high-output respectively. Their noise margins equal only 
the threshold voltages of the p-transistor (Fig. 9) and the n-transistor (Fig. 10) 
respectively when they are latched. The failures will cause leakage current in the two 
stages. The same as before, if the failures are not serious enough, a chip will probably 
work at a high frequency but not at a low frequency. According to simulations, for 
unsized latches in a 0.8pm CMOS process [3] with typical parameters, the maximum 
clock slop for both the combinations in Fig. 9 and Fig. 10 are around 3.3ns. The figure is 
dramaticly reduced to 1.2ns for both combinations when the sizes of transistors marked 
by dots are increased by a factor of 3. Note again that the figure could be even lower 
when the transistor parameters deviates from the typical values 

We believe that the dominant latching failures have been covered by Figs 7, 9 and 
10. Based on the analyses, we can obtain the rules of thumb for a better circuit 
robustness as the following. ( 1 ) The critical transistors marked by dots in Figs. 7, 9 and 
10 should not be over-sized. (2) For a precharged TSPC latch, the result of an imperfect 
latching is only shown on its output node. In a pure precharged n-p pipeline, a p-latch 
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has more risk of a latching failure, which gives an imperfect low output of more than 
V n th The following n-latch is very sensitive to such an unperfect low output. Therefore, 
from robustness point of view, the p-latches are better to be non-precharged. (3) For a 
non-precharged latch, the result of an unperfect latching is first shown on its intermediate 
node. Any methods to stabilize the node will reduce the risk of latching failure, which 
will be described in the next section. (4) Static arrangements using cross coupling or 
feedback loop to stabilize input and output will certainly improve slope insensitivity. It 
means that static performance is also useful in improving circuit robustness, which will be 
introduced in section VTDL 

IV. A SINGLE-STAGE TSPC FULL-LATCH 

In order to get ride of the speed bottleneck of a TSPC pipeline, we propose a single- 
stage TSPC full-latch following a precharged n-block to form a favorable configuration 
in a pipeline, shown in Fig. 11. The TSPC full-latch* marked by the dash-line box, is 
formed by introducing an extra n-transistor into the original SP stage. The added n- 
transistor is controlled by the precharged node signal of the previous n-block. The 
control signal has a feature of inversed clock but is data-dependent during its evaluation 
phase. Both p- and n-branches in the full latch now become non-conductive during the 
high clock phase and data-dependently conductive during the low clock phase. It works 
perfectly with the input data of both one and zero. We can list a number of advantages of 
the single-stage TSPC full-latch. First, the data is fully latched at the output node so the 
succeeding stage does not have to be precharged. Second, no matter whether the 
succeeding stage is precharged or not, an inverter can be placed between them to 
generate complementary outputs. Third, the output node is a three-state node, just like 
that of a C 2 MOS stage [5], which is useful in, for example, driving a bus. The critical 
delay path of the fiill-latch is still the p-branch, the same as that of the original SP stage, 
and the load increase to the precharged node of the previous n-block is quite small. In the 
case of generating complementary outputs, the overall speed is certainly improved. 
Finally, the full-latch is insensitive to the unperfect output of the precharged n-latch. 
Compared to the original non-precharged TSPC latch (SP+SP), it has no intermediate 
latching node and the output after the inverter becomes quite robust which will never 
give an unperfect low state to its succeeding precharged n-latch. If TSPC-2 type 
precharged n-latch is used, see Fig. 1 1(b), it will be more robust. 

In section II, we mentioned two different kinds of TSPC latches, precharged and 
static (non-precharged). While the precharged version presents a speed advantage due to 
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the small fan-in, the non-precharged vera n has a 1 w sensitivity to noise and to 
unperfect inputs but, as mentioned above, a latching failure could first appear at its 
intermediate node. One modification which can stabilize the intermediate node of a non- 
precharged TSPC latch has been reported before [6], in which an extra transistor 
controlled by the latch output is introduced, see Fig. 12. When the intermediate node is 
low in a p-latch or high in an n-latch, its state can be well kept by the feedback to avoid a 
latching failure. However, there are two drawbacks with this solution. First, to switch the 
state of the intermediate node to high in a p-latch or to low in an n-latch, the first stage 
has to fight with the feedback loop, which reduce the speed, particularly for a p-latch. 
Second, the transistor gives an extra load to the output of the latch. Instead, we propose 
an alternative version which can do the same job without the drawbacks and, in the same 
time, can be used together with the single-stage TSPC fulHatch. In Fig. 13. the original 
static TSPC p- and n-latches are modified into three-state static n-latches. A p-transistor 
is used to charge the intermediate node of an n-latch to high during the low clock phase 
and an n-transistor is used to charge the intermediate node of a p-latch to low during the 
high clock phase. If the purpose is only to stabilize intermediate nodes, minimu m 
precharging transistors can be used (in this paper, mark * always represents a minimum 
size). A pipeline formed by the modified non-precharge latches will be insensitive to 
clock slope. 

The single-stage TSPC full-latch can be used after such a modified n-latch and form 
a favorable configuration in a pipeline, which is shown in Fig. 14. In this case, the size of 
the precharging p-transistor should be chosen to satisfy the pull-down speed of the 
single-stage full-latch but over-sizing should be avoid to prevent from latching failure. 
Instead, the sizes of the p-transistors in the first stage of the modified latch can be 
minimized, since they are only used preventing charge- sharing. 

The same kind of modification can be applied to the so-called split-output latches 
(named split-latch in the following) which are first introduced in [2] and are shown in 
Fig. 15. The advantage of this kind of latches is that only a single clocked transistor is 
used, which means the clock-related power consumption is minimized although the sizes 
of the half-swing controlled transistors in the output stages should be doubled: 
Simulations show that the n-splh-latch has more or less the same performance as a non- 
precharged n-latch down to a 3 V power supply. The combination of a modified n-splh- 
latch with the single-stage full- latch is shown in Fig. 16. In this configuration, only three 
clocked transistors are used. The input p- transistors) can be minimized or even 
eliminated completely. Simulations show that the n-transistor in the output stage of the 
n-splh-latch can receive a half-swing through charge-sharing even if the ^-marked p- 
transistor is missing, which works well d wn to a 3V power supply. N te that in this 
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case the precharging p- transistor also prevents the intermediate node from a latching 
failure but its over-sizing has very little impact on the same kind of latching failure as that 
of a precharged n-latch so the robustness of the flipflop is increased, as will be shown 
later. 

The single-stage TSPC full-latch can be made from a SN stage as well and placed 
after a precharged p-block or a modified static p-block. These two options are shown in 
Fig. 17. Of course, these are not favorable configurations for a high throughput pipeline 
but they could find other applications, e.g. in double edge-triggered flipflops. In the 
following, we shall name the two different types of single-stage full-latches as p-full-latch 
and n- full- latch respectively. 

V. A ROBUST DOUBLE-PIPELINE 

Double- edge-triggered flipflops have been discussed earlier [7] which can be used to 
construct a so-called double pipeline. If the above mentioned positive and negative 
flipflops are arranged properly, a double-pipeline can be formed easily. Fig. 18(a) shows 
a double pipeline formed by two such flipflop lines starting and ending with opposite- 
type blocks. Note that the two output blocks (p-p and n-n) must be three-state-output 
blocks so when one of the two is active the other one will not make any conflict. The 
advantage of a double pipeline is that from the view of outside it has a double data rate 
but from the view of inside each line works under a clock of half the total data rate and 
thus needs only half the speed except the input (demultiplexer) and output (multiplexer) 
stages which still need a full speed. Therefore, it is obvious that the speed of the double- 
pipeline shown in Fig. 18(a) wall be limited by the input and output stages marked. It 
would be preferred if each of them contains only a single stage like the one shown in Fig. 
18(b). 

The single-stage n- full- latch and p-full-latch fit to the application perfectly since they 
b a sidy present only one stage delays which are approximately the same for the n- and the 
p- full-latches so they can be used for the output stages in the double-pipeline shown in 
Fig. 19. Note that although the input stages are identical to the output stages in Fig. 19. 
they are controlled by the precharged node signal from their succedent blocks rather than 
their precedent blocks and, therefore, they are not full-latches indeed. However, the input 
p-stage or n-stage presents a much shorter hold-time than that of a simple SP or SN 
stage which can also possibly be used at the front to gain speed. A short hold-time is 
important for robust input latching. Assuming that if the input changes from low to high 
immediately after a positive clock edge, a simple SP stage will give a low output to the 
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succedem precbarged stage quickly enough to stop its evaluation, which has been 
discussed in [2]. The input p- stage shown in Fig. 19, however, takes the evaluating 
information of the precharged node (from high to low) to prevent the output to the 
precharged stage from high to low. Additionally, the extra serial n-transistor in the input 
p-stage also delays the output transition from high to low. For the combination shown in 
Fig. 19, the hold times are reduced from 120ps to 3 Ops by replacing the SP stage with 
the input p-stage and from 180ps to -40ps by replacing the SN stage with the input n- 
stage, respectively. In the simulations, clock and signal slopes are both 200ps in order to 
show the impact of the circuit itself (improvements are similar when slopes are 300ps). 
All transistor widths are 2um except the two in the precharged p-stage which are 4um 
(the middle one) and 8um (the top one) respectively. As long as the delay of the 
unclocked branch of the input stage (p- or n-) is larger than that of the evaluation delay 
of the precharged stage (n- or p-), the improvement will be dramatic, which shows the 
design direction and the robustness of the proposed input configuration. In this sense, the 
input stages can be seen as full-latches. 



VI. DUAL-RAIL LATCHES 



Precharge has been involved in all the above single-stage full-latches. For circuits 
with low activity rates, completely non-precharged latches are preferred from low power 
consumption point of view. If complementary inputs are available, there are efficient 
ways to construct complemetary- output latches, i.e. the dual-rail latches, which have 
already been seen in literature [8]. The CVSL-type latches described in [8] inherently 
give complementary outputs. The basic structures of p- and n-latches in this type are 
shown in Fig. 20. Note that the position of the clocked transistors can be exchanged with 
that of the input transistors but if the clocked transistors are directly connected to power 
or ground (do not worry about charge-sharing in this case) they may be sized first 
without increasing the fan-in of the latch. However, the problem with these latches is that 
their functions depend on the transconductance ratios of p- and n-transistors since one of 
the two input branches, n- or p-, has to fight against its complementary branch, p- or n-, 
to start a regeneration process. If the ratio is not properly designed or changes in 
different process and/or with different temperatures, the latches may stop working or 
present an unexpected large delay. For example, if W n =2um (the minimum width in a 
0.8um process, corresponding to an effective width of only 0.84^m), the n-latch can 
never work properly and will stop working when W p 3 3.4um. By contrast, in the p-latch, 

the n-size is better to be minimized. If the n-size is increased to 4um in the p-latch, the 
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proper p-sizes have to be more than 20jim, which makes the latch unnecessarily large. 
Great care must be taken in designing these latches particularly when logic is included. In 
the following, therefore, we present two alternatives. In the first, completely ratio- 
insensitive cross-coupled latches are introduced and, in the second, fast flipflop 
arrangements are proposed. 

The completely ratio-insensitive cross p- and n-latches are shown in Fig. 21. Each of 
them is formed by cross-connecting two identical TSPC fiilHatch stages which has been 
mentioned before. The reason for being ratio-insensitive is that there is no confliction 
between n- and p-branches. From the first glance, they seem to present larger fan-in than 
that of CVSL-type latches but, in fact, it is not so obvious. To be fair, we can compare 
their delays under the same fan-in and load. Before that, we first obtained the best ratio 
corresponding to the least delay for CVSL-type p- and n-latches and kept the ratio for 
different fan-in values. For the cross p-latch the p-size is fixed to twice the n-size and for 
the cross n-latch the p- and n- sizes are kept equal for different fan-in values. A minimum 
inverter ( W p =W n =2nm) is used as a load for every simulated latch. Results are shown in 
Fig. 23. The cross-coupled TSPC p-latch is apparently better than the CVSL-type p- 
latch, see Fig. 23(a), not only ratio-insensitive but also less fan-in under the same delay. 
Oppositely, the cross n-latch presents larger delay than that of the CVSL-type n-latch but 
has an advantage of ratio-insensitive. A better combination could thus be a CVLS-type n- 
latch plus a cross p-latch, shown in Fig. 24. 

There is an even better alternative, a high speed arrangement. We found that when a 
flipflop is formed by the CVLS-type n- and p-latches, one of them does not have to be a 
full-latch. For example, the p-latch can be replaced by just two separate TSPC SP-stages 
as shown in Fig. 25(a). The speed bottleneck is thus removed immediately. Although 
during the high clock phase a high-input to the SP-stage will lead to a low-output 
directly (no latching at all), the low-output does not have any impact on the CVLS-type 
n-latch if the latch has flipped. It is safe for a chain of this kind since the pull-up delay is 
always later than the pull-down delay for the previous CVSL-type n-iatch The safe 
condition is that the pull-up delay of the previous CVSL-type n-latch plus the pull-down 
delay of the SP-stage should be more than the flip-delay of the next CVSL-type n-latch. 
The speed improvement is significant while the power consumption is even lower. The 
delay of the p-latch now becomes much less than that of the n-latch. The size of the p- 
latch, therefore, can be minimized giving even less fan-in than that of otherwise a CVLS- 
type p-latch. A chain of this kind can work up to twice the clock rate of a complete 
CVLS-type latch chain. Note that logic can still be included in both latches and the p- 
latch is ratio insensitive. If the p-latch is just a passing stage in a high speed pipeline, it 
can be further simplified. As the CVSL-type n-latch has pull-up driving capability in the 
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latching phase, the two p-transistors can be borrowed by the two SP-stages as shown in 
Fig. 25(b). If such a borrowing is done by a CVSL-type p-latch, it will require a 
significant size increase of the CVSL-type n-latch due to the problem of ratio- 
dependence. However, there is no such problem in Fig. 25(b). As long as the succedent 
stage is a CVSL-type n-latch, a cross-coupled TSPC n-latch or a precharged TSPC n- 
latch, it works. This further reduces the power consumption under the maximum speed. 

Two issues need to be mentioned. The first is that, for safety reason, the n- 
transistors marked by * in the SP-stages should be minimized. If the flip-delay of the next 
CVSL-type n-latch is too large (with heavy logic, for example) so the minimized n- 
transistor is still too fast, the two SP-stages can be modified into the type shown at the 
right side of Fig. 25(a). Such modification can increase the pull-down delay to the 
desired value without increasing either fan-in or power consumption. The second is that 
when a chain of this kind has to be terminated with the two SP-stages like that in Fig. 
25(b), in order to have latched data, one more SP- stage for a single-rail output or two 
more SP-stages for a dual-rail output can be cascaded after one or two outputs. We will 
not mention the two solutions again when similar circuits appear later. 

Based on the above principle, a second arrangement is to use two separate TSPC 
SN-stages together with a cross-coupled TSPC p-latch, shown in Fig. 26(a). In this 
arrangement, the delay of n-latch is reduced so much (to half the delay of a CVSL-type 
n-latch) that more logic operations can be included in the n-latch although they should be 
complementary. A third arrangement could be to cascade the two SN-stages and to 
arrange only single-rail logic in the first SN-stage, shown in Fig. 26(b). 

vjl Single-transistor-clocked latches 

In order to reduce power consumption, it is dreamed that a latch uses only a single 
clocked transistor which has not been reported so far. In this context, we do not mean 
things like a pass n-transistor plus a buffer but a full-latch with complementary input and 
output. In the following, we propose two kinds of single-transistor-clocked (STC) 
latches, STC-1 and STC-2 latches. The first kind of single-transistor-clocked (STC-1) 
latches are shown in Fig. 27, which is evolved from the CVSL-type latches. 

In the first thought, it seems perfect that the two latches could be cascaded to form a 
flipflop. However, it is very risky to cascade the two latches. The problem is not the 
charge-sharing between the output nodes and the common node, which will be 
automatically overcome by the pull-up (for an n-latch) and pull-down (for a p-latch) 
capability. The problem is the transparency between two output nodes. The condition to 
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avoid such a transparency is that the two input transistors should not be in conducting 
simultaneously. It means that for a STC- 1 n-latch, the high-to-low input transition must 
precede the low-to-high input transition. Unfortunately, a STC-1 p-latch gives an 
opposite order of output transitions to the n-latch. Therefore, the two latches can not be 
directly cascaded. Apart from this, the p-latch presents much larger delay than that of the 
n-latch due to the same reason as mentioned before. In order to utilize the STC- 1 n-Iatch, 
we found that the n-latches in the fast flipflop arrangements in Fig. 25 in the last section 
can possibly be replaced by the STC-1 n-latch, which are shown in Fig. 28. Since the two 
SP stages in Figs. 28(a) or (b) always give a high-to-low transition first, making the 
succeeding STC- 1 n-latch work safely. This leads to both fast and small cock load. 

It could still be improved since the p-latch uses two clocked transistors and p- 
transistors are involved in logic. In order to have only single clocked transistor and n- 
transistors in logic in both latches, we propose a completely new kind of latch, the 
second kind of single-transistor-clocked (STC-2) latch. The p-latch of this kind, STC-2 
p-latch, can be used together with the STC-1 n-latch to form single-transistor-clocked- 
latch (STCL) flipflop s, see Fig. 29. Note that since the STC-2 latch is insensitive to the 
input transition order, an inverter can be used to transfer the dual-rail positive-edge 
triggered flipflop in Fig. 29(a) to the single-rail positive-edge triggered flipflop in Fig. 
29(b). 

The STC-2 p-latch looks similar to the STC-1 n-latch. For example, both cross- 
coupled pairs are formed by p-transistors. However, they are quite different. The basic 
function of the STC-2 p-latch is similar to that of the two SP-stages, i.e. to transfer data 
during low clock phase and to latch the low-output data during high clock phase and the 
high-output data in the beginning of high clock phase. The input transition order to the 
STC-2 p-latch is not important although the STC-1 n-latch always gives the high-to-low 
transition first, which is perfect. When clock falls, the common node of the STC-2 p- 
latch will be charged up to a voltage depending on the ratio between the conductances of 
the clocked transistor and the on-branch. Since the on-branch is formed by a p-transistor 
and an n-transistor in serial which sizes are minimized, the working ratio is easily 
satisfied. The output where the n-transistor is on will be kept low and the output where 
the n-transistor is off will be pulled to high, which will turn off the p-transistor where the 
output is low. Finally, both outputs are firmly defined by the pull-up and pull- down 
branches. Note that, the reason of having small delay is because it has much less ratio 
problem and the delay is caused by only a single transition (low-to-high) not by two 
transitions like that of the p-latch in Fig. 27. When clock rises, if the inputs remain the 
same, the output states will be kept although the high- output will lose pull-up capability. 
If the inputs change to opposite states, both outputs become low after a certain delay. 
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The original low-output will not share the charge on the common node, since the gate 
and the source of the p-transistor which is originally off will be pulled down 
simultaneously with a difference almost equal to the p-threshold voltage, confirmed by 
simulation. Compared with the two separate SP- stage airangement, the STC-2 p-latch 
uses only a single clocked transistor and only n-transistors in logic. The delay of high-to- 
low transition during latching phase becomes longer due to the discharging of the 
common node, which is favorable to the next n-latch. The delay is simulated to be 
approximately twice the delay of high-to-low transition of the n-latch, enough to 
guarantee the flip of next n-latch. Note that this delay is allowed to be equal to a whole 
clock cycle so the speed is not affected. The size of the clocked p-transistor in the STC-2 
p-latch can be used to control the delay ratio between low-to-high and high-to-low 
transitions. The fan-in of the p-latch is minimized even if logic is included, giving less 
load to the n-latch and making the flipflop very fast. The STCL-flipflop, therefore, is 
superior both in high speed and in low power consumption. When a chain of this kind has 
to be terminated with a STC-2 p-latch, besides the methods mentioned in section VI, the 
termination stage indicated in Fig. 29(c) can be used, which is simple and not clocked 
(can be used for other similar cases as well). 



vra. Static tspc flipflops 

TSPC was introduced as a high-speed dynamic circuit technique. In that case, a high 
frequency clock was assumed, which is reasonable for most of high speed circuits. 
However, in special cases static performance is very usefid. By saying "static", we mean 
the clock can be at zero frequency (against the concept of "dynamic"), which should be 
distinguished from the concept of non-precharged. The MSB-circuit in an asynchronous 
counter the toggle frequencies may be well beyond the low frequency limit of a dynamic 
circuit. In order to reduce power consumption of a large chip, part of the chip circuit 
may stay in idle (a zero clock frequency). Moreover, a static flipflop has a large noise 
margin and small clock slope sensitivity. Therefore, it is of great interest to introduce 
static TSPC flipflops. 

In many cases, it is enough for a flipflop to stay idle at low clock phase (or high 
clock phase). To maintain a clock at either low or high should not be a problem. 
Therefore, a so-called semi-static TSPC flipflop would be adequate mostly. The principle 
of a semi-static divider was shown in [9]. We could used the principle to construct semi- 
static TSPC flipflops as well. Since the logic may included in the n-latch, h is thus better 
to arrange p-latch as a static part. This can be done with a TSPC p-full-latch as shown in 
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Figs. 30(a) and (b). While Fig. 30(a) eliminates completely the confliction between p- and 
n-branches, Fig. 30(b) uses less transistors (less clock load) with very little confliction 
which will not pose any danger to the function as long as the size of p-transistor in the 
dashline box is kept minimum In practice, sizes of both p- and n-transistors in the 
dashline boxes should be kept minimum to minimize the load. The gate connections (to 
the half-swing nodes) in Figs. 30(a) and (b) make them very weak when they are 
conducting and give less load to the real output. 

It is attractive to use the so-called RAM-type latches [8], see Fig. 31, to construct a 
fully-static TSPC fiipflop although the ratio issue has to be carefully handled and the p- 
latch is quite slow. In order to reduce clock load, we propose to use static versions of the 
STC-1 p- and n-latches which can be safely cascaded to form a fully-static fiipflop, 
shown in Fig. 32. This is because the two output nodes now have both pull-up and pull- 
down capabilities. For example in the SCT-1 n-latch, if the two output nodes are 
temporarily connected by the two input transistors when both inputs are high, the two 
output nodes will have a voltage difference equal to the voltage drop on the two input 
transistors and will later recover the original logic states when they become 
nontransparency again. In both flipflops, the sizes of transistors marked by * can be 
minimized. The flipflops shown in Figs. 31 and 32 are fully-static and have 
complementary outputs available. However, the p-latches turns out to be much slower 
than the n-latches (more than a factor of two). It is thus necessary to replace the p-latch 
with a static cross p-latch. The fiipflop constructed by a static STC-1 n-latch and a static 
cross p-latch is shown in Fig. 33. In the static cross p-latch, the two extra p-transistors 
lock the high-output and the extra n-transistor locks the low-output (through one of the 
two bottom n-transistors) during high clock phase. The fiipflop is significantly faster than 
static flipflops constructed by pure RAM-type or pure static STC- 1 latches. 

Again, in most cases, a semi-static fiipflop might be enough. Therefore the dynamic 
flipflops using STC-1 n-latch and two separate TSPC SP stages (see Fig. 28) can be 
modified into semi-static flipflops by replacing the dynamic STC-1 n-latch with the static 
STC- 1 n-latch, shown in Fig. 34. The clock to these two flipflops can stay idle at low. 

If the clock needs to stay idle at high state, we can combine two separate TSPC SN 
stages with a static cross p-latch, similar to its dynamic version shown in Fig. 26(a). In 
this case, one can simply replace the dynamic cross p-latch with a static cross p-latch. 
However, in order to reduce clock load, one can also modify the static cross p-latch 
shown above and use two clocked transistors instead of three. Since the clocked n- 
transistor is minimized, the clock load is almost reduced by a factor of two. The semi- 
static flipflops with both unmodified and modified static cross latches are shown in Fig. 
35. The reason for possibly using only one clocked p-transistor at the top is that the two 
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SN-stages always give a low-to-high transition first to the static cross p-latch, which 
guarantees the decoupling of two output nodes. It is risky to do so for the folly static 
flipflop shown in Fig. 33 and also risky if one tries to use the modified static cross p-latch 
for the single-rail input flipflop shown in Fig 26(b). 

In order to have all advantages, such as high speed, low power consumption, small 
fan-in and fully-static, in a single flipflop. We finally in this section propose a static 
version of STC-2 p-latch and a static STCL flipflop. First, let us go back to the dynamic 
STCL flipflop in section VO (see Fig. 29). It is obvious that the n-latch in Fig. 29 can be 
replaced by the static STC-1 n-latch. The task is to modify the STC-2 p-latch from 
dynamic to static. This can be done by adding a minimum inverter and two minimum n- 
transistors into the dynamic STC-2 p-latch as shown in Fig. 36 where the static STC-2 p- 
latches together with the static STC-1 n-latches are used in both positive and negative 
edge triggered folly-static high performance flipflops. 

To make the dynamic STC-1 p-latch static need only to prevent the low-output from 
floating to high. One does not need to worry the high-output to float to low which has 
no impact to the next static n-latch. If the inputs to the p-latch do not change, the above 
conditions will be satisfied automatically since the high-input will always pull-down the 
corresponding low-output. Therefore, one needs only to consider the situation when the 
inputs to the p-latch are flipped during high-clock phase. In this case, the low-output 
loses the pull-down capability and might float to high although the original high-output is 
now pulled down by the new high-input. The static STC-2 p-latch in Fig. 36 can prevent 
this from occurring. Since the original high-output is pulled down, the common node will 
be forced down and the inverter will give a high output to the two extra n-transistors to 
pull all other internal nodes down firmly. Only when the clock goes low, the common 
node is charged to high and the inverter gives a low output to turn off the two extra n- 
transistors. In this case, the latch returns to normal . Compared to the dynamic version of 
the p-latch, the extra n-transistor now should be counted into the conductance ratio. 
However, this is not a problem. A clocked p-transistor with twice the minimum size will 
make the latch work nicely provide that all other transistors are minimized. The static 
STCL flipflop, like its dynamic version, has minimized clock load and fan-in for both p- 
ans n-latches and superior in speed and low power consumption. The semi-static version 
of the STCL flipflop can be formed by combinations of dynamic and static versions of the 
STC-2 p- and STC-1 n-latches. Positive-edge triggered semi-static flipflops suiting either 
idle-high or idle-low clock are given in Fig. 37. The same as shown in Fig. 36(b), the two 
dual-rail flipflops in Fig. 37 can also be modified into single-rail flipflops by using input 
inverters. 
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DL PERFORMANCE COMPARISONS 

The performances of above introduced fhpflops are compared through simulations in 
this section. For each of them, three identical flipflops are cascaded to simulate the 
realistic input driving source and output load but only the middle one gives the results. 
Complementary outputs are always assumed for calculating the worst delays. Typical 
SPICE parameters of a 0.8^m CMOS single-poly double-metal process are used. In 
order to make fair comparisons, circuits are not separately sized. Instead, the widths of 
all n-transistors and p-transistors are fixed to 3u.m and 6u.m respectively. The minimum 
width of the process, 2fim, is given to those transistors marked by "* M which means that 

their sizes are minimized, e.g. the transistors in static locking loops. Three issues are 
compared: the worst delays (WD), the maximum clock slopes (MCS) and the power- 
dissipation (PD). Only dynamic power dissipations are taken into account. These 
dissipations are calculated from node to node according to their power-weights described 
in the following. The first is the activity rate A of a node. .4=1.0 for the gate of a clocked 
transistor, ,4=0.5 for a precharged node and A*0.5 for a normal node. The second is the 
swing 5 of a node, 5=1.0 for an output node, 5=0.7 for a node between same-type 
transistors (body-effect) and 5=0 for a power or ground node. The third is the 
capacitance C. which is calculated by summing the capacitances of gate-to-substrate and 
drain/source-to-substrate connected to the node. In the 0.8^m CMOS process, the 
capacitance values of n(or p)-gate-to-substrate and n-drain(source)-to-substrate are quite 
similar so they are weighted to 1.0 for a minimum width transistor (2um), defined as a 
unit-capacitance, while the capacitance values of p-drain(or source )-to-substrate and n(or 
p)-gate-to-drain(or source) are weighted to 1.2 and 0.2 respectively. The contribution of 
a gate-to- source capacitance is directly added to the gate node. The contribution of a 
gate-to-drain capacitance is calculated in two ways. First, if the gate transition directly 
leads to a drain transition, its contribution is multiplied by a factor of 4 because it is not 
only discharged but also recharged oppositely (a factor of 2) and such a discharge- 
recharge happens every transition not every two transition (another factor of 2) like that 
of a substrate related capacitance. The total dynamic power dissipation P d (per Hz) is 

then calculated by P</=_/l ,5,2c,, where i is the node number from l...nx 
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Table 1 Performance comparison of dynamic flipflops 



No. 


Flipflop 


WD 
(ns) 


MCS 
(ns) 


Power Dissipation 
0SAS0.5 A=0.1A=0.5 


CT 


T 


Fig. 


1 


PN-SN-PP-SP-INV 


0.70 (P) 


3.5 


35.1+33.5A 


38.5 


51.9 


6 


14 


2(a) 


2 


SN-SN-SP-SP-INV 


0.72 (P) 


2.6 


11.8+73.9A 


19.2 


48.8 


4 


14 


2(b) 


3* 


PN-SN-FL(P)-INV 


0.54 (P) 


4.2* 


2 1.2+39. 1A 


25.1 


40.8 


4 


12 


11(a) 


4* 


PN/SN-FL(P)-INV 


0.54 (P) 


7.2* 


21.6+39. 1A 


25.5 


41.2 


4 


12 


1Kb) 


5t 


PSN-SN-FL(P)-INV 


0.57 (P) 


4.2 


22.0+4 1.0A 


26.1 


42.5 


4 


13 


14 


6* 


SPLIT(N>-FL(P)-INV 


0.55 (P) 


10.5 


18.9+41. 1A 


23.0 


39.5 


3 


12 


16 


7 
8^ 

lot 

lit 


CVSL(N)-CVSL(P) 

CVSL(N)-CROSS(P) 

STC1(N)-(SP+SP) 

STC1(N)/(SP+SP) 

STC1(N>-STC2(P) 


0.74 (P) 
0.48 (N) 
0.41 (N) 
0.41 (N) 
0.35 (P) 


50 

3.8 

4.5 

4.5 

4.2 


20.8+48.9A 
18.0+65.5A 
8.1+40.5A 
8.1+37.5A 
9.0+42.7A 


25.7 
24.6 
12.2 
11.9 
13.3 


45.3 
50.8 
28.4 
26.9 
30.4 


4 
4 

3 
3 
2 


12 
14 
11 
9 
10 


20 
24 

28(a) 
28(b) 
29 



* After inverters and if before inverters they are 2.8ns and 5. Ins respectively. 



The comparison results of the dynamic latches, the semi-static and the static latches 
are respectively listed in tables 1, 2 and 3. In table 1, flipflops 1-6 are single-rail types 
and flipflops 7-11 are dual-rail types. New proposed circuits are marked by sign M t w . 
Flipflops are constructed by cascading different stages indicated by their names with a 
connecting sign w -" between. For TSPC precharged n-latches, PN-SN means TSPC-1 
type while PN/SN means TSPC-2 type. The characters P and N in parentheses are used 
to either identify a p-latch or an n-latch or indicate the types of delay- dominant latches. 
Stage SFL represents static versions of the TSPC single-stage full-latch. Flipflop 10 in 
table 1 and flipflop 16 in table 2 are merged- stage types. WD, MCS, A, CT and T mean 
the worst delay, the allowable maximum clock slope, the activity rate, the number of 
clocked transistors and the total transistor count respectively. The classic master-slave 
flipflop is formed by four transmission gates, four inverters and a clock buffer to offer 
two-phase clocks internally [8]. 
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Table 2 Performance comparison of semi-static flip flops 



No. 


Flipflop 


WD 
(ns) 


MCS 
(ns) 


Power Dissroation 
0SAS0.5 A=0.1A=0.5 


CT 


T 


Fig. 


12 


RAM(N)-CVSL(P) 




0.78 (P) 


50 


20.8+52.8A 26.1 47.2 


4 


14 




13t 


PN/SN-SFUNV 




0.55 (P) 


4.6 


21.5+43.3A 25.8 43.2 


4 


14 


30(b) 


14t 


RAM(N)-CROSS(P) 




0.49 (N) 


4.0 


18.0+69.4A 24.9 52.7 


4 


16 




15t 


SSTC1(N)-(SP+SP) 




0.48 (N) 


5.2 


8.5+44.4A 12.9 30.7 


3 


13 


34(a) 


16t 


SSTC1(N)/(SP+SP) 




0.47 (N) 


5.5 


8.5+4 1.4A 12.6 29.2 


3 


11 


34(b) 


17t 


STC2(P)-SSTC1(N) 




0.36 (P) 


5.0 


9.8+46.6A 14.5 33.1 


2 


12 


37(a) 


18t 


SSTC2(P)-STC1(N) 




0.36 (N) 


5.0 


9.8+52.8A 15.1 36.2 


2 


14 


37(b) 


Table 3 Performance comparison of fully-static flipflops 


No. 


Flipflop 


WD 
(ns) 


MCS 
(ns) 


Power Dissipation 
0SASO.5 A=0.1A=0.5 


CT 


T 


Fig- 


19 


Classic Master-Slave 




0.85 


55 


38.3+108A 50.1 93.3 


10 


18 




20 


RAM(N)-RAM(P) 




0.89(P) 


50 


20.8+56.8A 26.5 49.2 


4 


16 


31 


21* 


SSTC l(N>SCROSS(P) 


0.75(P) 


6.5 


13.7+75.4A 21.2 51.4 


2 


18 


33 


22t 


SSTC1(N)-SSTC2(P) 




0.36(N) 


6.2 


9.8+56.8A 15.5 38.2 


2 


16 


36 



A clear tendency is that the new flipflops are obviously faster and consume less 
power, compared to the original ones in every table, which means the improvements of 
power-delay-products are significant. We could divide the flipflops into four groups: the 
single-rail dynamic flipflops (1-6, group 1), the dual-rail dynamic flipflops (7-11, group 
2), the semi-static flipflops (12-18, group 3 ) and the fully-static flipflops (19-22, group 
4). Their power-delay products are plotted in Figs. 38-41. In each of the four groups, 
one can find the first (or phis the second) is the one(s) used as reference for comparison. 
The tables are made in such a way that the power-delay product decreases when the 
number increases in each group. Therefore, the best improvements can be found by 
comparing the first and the last flipflops in each group. From group 1 to group 4, in the 
best cases, the delays are reduced by factors of 1.3, 2.1, 2.2 and 2.4 while the power- 
delay products are reduced by factors of 1.9, 3.5, 3.4 and 6.5 respectively. This indicates 
that the new proposed flipflops are featured by both high speed and low power 
consumption. In the same time, the maximum allowable clock slopes are generally 
increased (the most obvious one is circuit 7, a factor of 3), compared to the original 
TSPC flipflops. Note that, the flipflops using pure CVSL-type, RAM-type or classic 
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latches allow very long clock slopes but present largest delays and highest power-delay 
products, which makes them only interesting in special cases. Almost all new flipflops use 
less transistors and particularly less clocked transistors. Hie STC-1 and STC-2 latches 
are clocked by only a single clocked transistor. Their advantages are not con^letely 
covered by the table and the plots. For example, when logic needs to be included in the 
latches, the unique type of flipflops (11, 17, 18 and 22) with n-transistor-only logic (in 
both n- and p-latches and in both dynamic and static versions) will show more superior 
performance over the others. 



X CONCLUSIONS 

We have introduced new TSPC-latches and flipflops in this paper in order to 
upgrade CMOS circuit performance. The performance of synchronous CMOS circuit, in 
a large extent, is determined by latches and flipflops used. The slow p-transistor and the 
need of complementary outputs make the p-block in a commonly accepted n-p pipeline 
structure the speed bottleneck. The slope sensitivity problem demands shorter and 
shorter clock slope, which leads to huge clock buffer and unacceptable power 
consumption. The low power design asks for static TSPC flipflops. These problems can 
be solved or alleviated by the proposed new TSPC-latches and flipflops. 

A TSPC single-stage full-latch has been proposed for improving the original TSPC 
pipeline. By combinations of the proposed fiiU-latch with original TSPC stages, the delay 
of p-block, the speed bottleneck, can be reduced by 30%, becoming comparable to the 
delay of n-block. The power-delay product of such a pipeline can be reduced by a factor 
of 1.9. The allowable maximum clock slope are generally increased and in the best case 
by a factor of 3. A fast and robust TSPC double-pipeline is introduced by using the p- 
and n-versions of the proposed full-latch stage for the front and the end stages of the 
pipeline which is the most critical in speed and robustness. 

Dual-rail latches have inherently complementary outputs available and are usually 
non-precharged, e.g. the CVSL-type and the RAM-type. However, the investigation 
indicates that the p-blocks are the serious speed bottleneck (worse than the original 
TSPC) and the ratio problem needs great care. To handle these drawbacks, new dual- 
latches and flipflops have been proposed. Among them are the dynamic, semi-static and 
fiilly-static versions of ratio-insensitive cross-coupled latches, the STC latches (single- 
transistor-clocked latches, STC-1 and STC-2) and the fast flipflops using the STC 
latches together with the TSPC SP-stages. They are easier to design and show very high 
performance. The delays are reduced by factors of 2.1, 2.2 and 2.4 respectively for the 
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dynamic, the semi-static and the fully-static flipflops, compared to their original 
counterparts. In the same time, the power consumptions are greatly reduced so the 
power-delay products are improved by factors of 3.5, 3.4 and 6.5 respectively. Although 
the allowable maximum clock slopes of them can not compete with the CVSL-type and 
the RAM-type, they have been improved by factors of 1.5-2, compared to the original 
TSPC. A unique point of the flip flops using STC latches is that all logic transistors are in 
n-type (for both n- and p-latches and for both dynamic and static versions), which gives 
large speed room for such a pipeline. 

Thereby, we conclude that the proposed new TSPC latches and flipflops are superior 
in both high speed and low power and can significantly upgrade the existing CMOS 
circuit performance. 
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CLAIMS 

1 . A kind of CMOS non-precharged circuits containing the combinations of the 
following two circuit blocks: 

a. circuit block 1, cb- 1 , characterised in that during its unlatching phase the logic 
path(s) between its output(s) and relevant input(s) is (are) transparent for both high 
and low output states and during its latching phase the logic path(s) between its 
output(s) and relevant input(s) is (are) isolated only in one of the two states for a 
particular output called the isolated-logic-state of the output (either low or high), 
where its unlatching and latching are controlled by the two states, state 1 and state 
2 respectively, of a single clock connected to cb-1; 

b. circuit block 2, cb-2, characterised in that during its unlatching phase the logic 
path(s) between the output(s) and relevant input(s) is (are) transparent only in one 
of the two states for a particular input called the non-transparent-logic-state of the 
input which must be identical to the isolated-logic-state of the output of cb-1 if the 
output of cb- 1 is connected to the input and during its latching phase the logic 
path(s) between the output(s) and relevant input(s) is (are) isolated for both high 
and low input states, where its unlatching and latching are oppositely controlled by 
the two states, state 2 and state 1 respectively, of the same clock connected to both 
cb-1 and cb-2; 

2. A p-type dynamic differential arrangement of cb-1, pdd-cb-1, according to Claim 1, 
characterised in that it comprises: 

a. two n-transistors, n- 1 and n-2, with both their sources grounded, with the gate and 
drain of n- 1 as INPUT and OUTPUTBAR of pdd-cb-1 respectively and with the 
gate and drain of n-2 as INPUTBAR and OUTPUT of pdd-cb-1 respectively; 

b. two p-transistors, p- 1 and p-2, with the drain of p- 1 connected to both the drain of 
n- 1 and the gate of p-2 and with the drain of p-2 connected to both the drain of n-2 
and the gate of p- 1 ; 

c. a third p- transistor, p-3, with its source connected to power, with its drain 
connected to both sources of p- 1 and p-2 and with its gate as the clock input, 
CLOCK, ofpdd-cb-1; 

3. A p-type fully-static differential arrangement of cb-1, psd-cb-1, according to Claims 
1 and 2, characterised in that it comprises: 

a. a pdd-cb- 1 consisting of two n-transistors, n- 1 and n-2, three p-transistors, p-1, p-2 
and p-3, and corresp nding connections according to Claim 2, where its INPUT, 
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INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK become the INPUT, 
INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK of psd-cb-1; 

b. two new n-transistors, n-3 and n-4, with their sources connected to OUTPUT and 
OUTPUTBAR respectively, with both their drains connected to the drain of p-3; 

c. an inverter with its input connected to the drain of p-3 and its output connected to 
both the gates of n-3 and n-4; 

4. An n-type dynamic differential arrangement of cb-2, ndd-cb-2, according to Claim 
1, characterised in that it comprises: 

a. two p-transistors, p-1 and p-2, with both their sources connected to power, with 
the gate of p-1 connected to the drain of p-2 and with the gate of p-2 connected to 
the drain of p- 1 ; 

b. two n-transistors, n- 1 and n-2, with the drain of n- 1 connected to the drain of p- 1 , 
with the drain of n-2 connected to the drain of p-2, with the gate and drain of n- 1 
as INPUT and OUTPUTBAR of ndd-cb-2 respectively and with the gate and drain 
of n-2 as INPUTBAR and OUTPUT of ndd-cb-2 respectively; 

c. a third n-transistor, n-3, with its source grounded, with its drain connected to both 
sources of n-1 and n-2 and with its gate as the clock input, CLOCK, of ndd-cb-2; 

5. An n-type fully-static differential arrangement of cb-2, nsd-cb-2, according to 
Claims 1 and 4, characterised in that it comprises: 

a. an ndd-cb-2 consisting of two p-transistors, p- 1 and p-2, three n-transistors, n-1, n- 
2 and n-3, and corresponding connections according to Claim 4, where its INPUT, 
INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK become the INPUT, 
INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK of nsd-cb-2; 

b. two new n-transistors, n-4 and n-5, with their sources grounded, with both the 
drain of n-4 and the gate of n-5 connected to OUTPUT and with both the drain of 
n-5 and the gate of n-4 connected to OUTPUTBAR; 

6. A non-precharged dynamic differential positive edge-triggered flipflop according to 
Claims 1, 2 and 4, characterised in that it comprises: 

a. a pdd-cb- 1 according to Claim 2, where its INPUT, INPUTBAR and CLOCK 
become the INPUT, INPUTBAR and CLOCK of the flipflop; 

b. an ndd-cb-2 according to Claim 4, where its OUTPUT, OUTPUTBAR and 
CLOCK become the OUTPUT, OUTPUTBAR and CLOCK of the flipflop and its 
INPUT and INPUTBAR are connected with the OUTPUT and OUTPUTBAR of 
the pdd-cb- 1 respectively; 
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7. A non-precharged fully-static differential positive edge-triggered flipflop according 
to Claims 1, 2, 3, 4 and 5, characterised in that it comprises: 

a. a psd-cb- 1 according to Claim 3, where its INPUT, INPUTBAR and CLOCK 
become the INPUT, INPUTBAR and CLOCK of the flipflop; 

b. an nsd-cb-2 according to Claim 4, where its OUTPUT, OUTPUTBAR and 
CLOCK become the OUTPUT, OUTPUTBAR and CLOCK of the flipflop and its 
INPUT and INPUTBAR are connected with the OUTPUT and OUTPUTBAR of 
the psd-cb- 1 respectively; 

8. A non-precharged low-clock-idlable semi-static differential positive edge-triggered 
flipflop according to Claims 1, 2, 4 and 5, characterised in that it comprises: 

a. a pdd-cb- 1 according to Claim 2, where its INPUT, INPUTBAR and CLOCK 
become the INPUT, INPUTBAR and CLOCK of the flipflop; 

b. an nsd-cb-2 according to Claim 4, where its OUTPUT, OUTPUTBAR and 
CLOCK become the OUTPUT, OUTPUTBAR and CLOCK of the flipflop and its 
INPUT and INPUTBAR are connected with the OUTPUT and OUTPUTBAR of 
the pdd-cb- 1 respectively; 

9. A non-precharged high-clock-idlable semi-static positive edge-triggered flipflop 
according to Claims 1, 2, 3 and 4, characterised in that it comprises: 

a. a psd-cb- 1 according to Claim 3, where its INPUT, INPUTBAR and CLOCK 
become the INPUT, INPUTBAR and CLOCK of the flipflop; 

b. an ndd-cb-2 according to Claim 4, where its OUTPUT, OUTPUTBAR and 
CLOCK become the OUTPUT, OUTPUTBAR and CLOCK of the flipflop and its 
INPUT and INPUTBAR are connected with the OUTPUT and OUTPUTBAR of 
the psd-cb- 1 respectively; 

10. An n-type dynamic differential terminative stage, ndd-ts, characterised in that it 
comprises: 

a. two p-transistors, p- 1 and p-2, with both their sources connected to power, with 
the gate of p- 1 connected to the drain of p-2 and with the gate of p-2 connected to 
the drain of p- 1 ; 

b. two n- transistors, n-1 and n-2, with both their sources grounded, with the drain of 
n- 1 connected to the drain of p- 1, with the drain of n-2 connected to the drain of p- 
2, with the gate and drain of n-1 as INPUT and OUTPUTBAR of the stage 
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respectively and with the gate and drain of n-2 as INPUTBAR and OUTPUT of the 
stage respectively; 

11. A non-precharged dynamic differential negative edge-triggered flip flop according 
to Claims 1, 2, 4 and 10, characterised in that it comprises: 

a. an ndd-cb-2 according to Claim 4, where its INPUT, INPUTBAR and CLOCK 
become the INPUT, INPUTBAR and CLOCK of the flipflop; 

b. a pdd-cb- 1 according to Claim 2, where its INPUT and INPUTBAR are connected 
with the OUTPUT and OUTPUTBAR of the ndd-cb-2 respectively; 

c. an ndd-ts according to Claim 10, where its OUTPUT and OUTPUTBAR become 
the OUTPUT and OUTPUTBAR of the flipflop and its INPUT and INPUTBAR 
are connected with the OUTPUT and OUTPUTBAR of the pdd-cb- 1 ; 

12. A non-precharged differential negative edge-triggered segment in a pipeline 
according to Claim 6 or 7 or 8 or 9, characterised in that: 

a. it comprises a flipflop claimed in Claim 6 or 7 or 8 or 9, where cb- 1 (pdd-cb- 1 or 
psd-cb-1) and cb-2 (ndd-cb-2 or nsd-cb-2) are replacing each other so the INPUT 
and INPUTBAR of cb-2 become the INPUT and INPUTBAR of the segment and 
the OUTPUT and OUTPUTBAR of cb-1 become the OUTPUT and 
OUTPUTBAR of the segment; 

b. it requires that the successive circuit block must be a cb-2 according to Claim 1 ; 

13. The opposite arrangement of the circuit blocks in Claim 2 or 3 or 4 or 5, i.e. ndd- 
cb-1 ornsd-cb-1 orpdd-cb-2 orpsd-cb-2 (opposite to pdd-cb- 1 or psd-cb-1 or 
ndd-cb-2 or nsd-cb-2 respectively), characterised in that it: 

a. uses the original arrangement in Claim 2 or 3 or 4 or 5; 

b. maintains the INPUT, INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK of the 
original arrangement; 

c. changes the original p-transistors to n-transistors, the original n-transistors to p- 
transistors, the original power to ground and the original ground to power; 

14. A non-precharged fully-static differential positive edge-triggered flipflop according 
to Claims 1, 5 and 13, characterised in that it comprises: 

a. a psd-cb-2 according to Claim 13, where its INPUT, INPUTBAR and CLOCK 
become the INPUT, INPUTBAR and CLOCK of the flipflop; 

b. an nsd-cb-2 according to Claim 5, where its OUTPUT, OUTPUTBAR and 
CLOCK become the OUTPUT, OUTPUTBAR and CLOCK of the flipflop and its 
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INPUT and ENPUTBAR are connected with the OUTPUT and OUTPUTBAR of 
the psd-cb-2 respectively; 

1 5. The opposite arrangement of the flipflop or circuit segment claimed in Claim 6 or 7 
or 8 or 9 or 11 or 12 or 14, characterised in that h: 

a. uses the original arrangement in Claim 6 or 7 or 8 or 9 or 1 1 or 12 or 14; 

b. maintains the INPUT, INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK of the 
original arrangement; 

c. makes replacements between pdd-cb- 1 and ndd-cb- 1 , between psd-cb- 1 and nsd- 
cb-1, between ndd-cb- 2 and pdd-cb- 2 and between nsd-cb-2 and psd-cb-2; 

d. changes the original positive edge-triggered feature to a negative edge-triggered 
feature or the original negative edge-triggered feature to a positive edge-triggered 
feature; 

16. A single-end-input arrangement for the flipflop or circuit segment claimed in Claims 
6 or 7 or 8 or 9 or 1 1 or 12 or 14 or 15, characterised in that it: 

a. uses the flipflop or circuit segment in Claim 6 or 7 or 8 or 9 or 10 or 11 or 13 or 14 
or 15 or 16; 

b. uses its INPUT, OUTPUT, OUTPUTBAR and CLOCK for the INPUT, OUTPUT, 
OUTPUTBAR and CLOCK of the single-end- input arrangement of the flipflop or 
circuit segment; 

c. uses a CMOS inverter with its input connected to the INPUT of the flipflop or 
circuit segment and its output connected to the INPUTBAR of the flipflop or 
circuit segment; 

17. A logic-included arrangement for the flipflop or circuit segment claimed in Claim 6 
or 7 or 8 or 9 or 1 1 or 12 or 14 or 15, characterised in that: 

a. n-1 and n-2 in cb-1 and/or cb-2 and/or ndd-ts according to Claims 1, 2, 3, 4, 5 and 
10 in the flipflop or circuit segment are replaced by n-network- 1 and n-network-2 
respectively; 

b. n-network- 1 is a network of n- transistors with a drain-end and a source-end 
connected in the same way as that of the drain and source of n- 1 and with their 
gates connected to the input-vector, INPUTV, and the network is conducting for a 
set of input- vectors, INPUTV- A, and nonconducting for the complementary set, 
INPUTV-B; 

c. n-network-2 is another network of n-transistors with a drain-end and a cource-end 
connected in the same way as that of the drain and source of n-2 and with their 
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gates connected to the input-vector, INPUTBARV, and the network is conducting 
for a set of input- vectors, INPUTBARV- A, and nonconducting for the 
complementary set, INPUTS ARV-B ; 

d. there can be more than one cb- 1 and/or cb-2 and/or ndd-ts in the flipflop or circuit 
segment with the connection order according to Claim 6 or 7 or 8 or 9 or 1 1 or 12 
or 14 or 15 or 16; 

e. there can be other non-precharged CMOS stages between cb- 1, cb-2 and ndd-ts but 
the logic inversion between the output(s) of cb- 1 and the input(s) of cb-2 is 
forbidden; 

18. A separate-stage p-type dynamic differential cb-1 arrangement, ss-pdd-cb-1, 
replacing pdd-cb-1 and a separate-stage n-type dynamic differential cb-1 
arrangement, ss-ndd-cb-1, replacing ndd-cb-1 in the flipflop or circuit segment 
claimed in Claim 6 or 8 or 1 1 or 12 or 15 or 16, characterised in that: 

a. ss-pdd-cb-1 or ss-ndd-cb-1 consists of two identical separate stages, p-type or n- 
type stage- 1 and stage-2; 

b. p-type stage 1 or 2 in ss-pdd-cb-1 consists of two p-transistors, p-1 and p-2, and 
one n-transistor, n-1, with the source of p-1 connected to power, with the drain of 
p- 1 connected to the source of p-2, with the drain of p-2 as the output and 
connected to the drain of n- 1, with the source of n- 1 grounded, with the gate of p-2 
as the clock input and with both the gates of p-1 and n-1 as the input; 

c. n-type stage 1 or 2 in ss^ndd-cb-1 is constructed by using the p-type stage 1 or 2 
with the p-transistor replaced by n-transistors, the n-transistor replaced by p- 
transistor, the power replaced by ground and the ground replaced by power; 

d. in both cases, the input and output of stage 1 become INPUT and OUTPUTBAR, 
the input and output of stage 2 become INPUTBAR and OUTPUT and the clock 
inputs of both stages 1 and 2 become the clock input, CLOCK; 

19. A single-end-input p-type dynamic differential cb- 1 arrangement, si-pdd-cb-1, 
replacing pdd-cb- 1 and a single-end-input n-type dynamic differential cb- 1 
arrangement, si-ndd-cb-1, replacing ndd-cb-1 in the flipflop or circuit segment 
claimed in Claim 6 or 8 or 15, characterised in that: 

a. si-pdd-cb-1 uses the p-type stages 1 and 2 according to Claim 18 and si-ndd-cb-1 
uses n-type stages 1 and 2 according to Claim 18; 

b. in both cases, the output of the first stage is connected to the input of the second 
stage; 
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c. in both cases, the input of the first stage become INPUT, the output of the first 
stage become OUTPUTBAR, the output of the second stage become OUTPUT 
and the clock inputs of both stages 1 and 2 become the clock input, CLOCK; 

20, A ratio-insensitive p-type dynamic differential cb-2 arrangement, ri-pdd-cb-2, 
replacing pdd-cb-2 in the flipflop or circuit segment claimed in Claim 1 5 or 16 or 
18 or 19, characterised in that: 

a. it comprises two stages, stages 1 and 2; 

b. stage 1 consists of two p-transistors, p- 1 and p-2, and two n-transistors, n- 1 and n- 

2, with the source of p-1 connected to power, with the drain of p-1 connected to 
the source of p-2, with the drain of p-2 as the output and connected to the drain of 
n- 1 , with the source of n- 1 connected to the drain of n-2, with the source of n-2 
grounded, with the gate of p-2 as the clock input, with the gate of n- 1 as input 1 
and with both gates of p- 1 and n-2 connected as input 2; 

c. stage 2 is identical with stage 1 but the transistors are named as p-3, p-4, n-3 and n- 
4 instead of p-1, p-2, n-1 and n-2; 

d. the output of stage 1 is connected with input 1 of stage 2 and becomes OUTPUT, 
the output of stage 2 is connected with input 1 of stage I and becomes 
OUTPUTBAR, input 2 of stage 1 becomes INPUTBAR, input 2 of stage 2 
becomes INPUT and both clock inputs become CLOCK; 

21. A ratio-insensitive p-type fully-static differential cb-2 arrangement, ri-psd-cb-2, 
replacing psd-cb-2 in the flipflop or circuit segment claimed in Claim 15 or 16 or 18 
or 19, characterised in that it comprises: 

a. five p-transistors, p- 1, p-2, p-3, p-4 and p-5, with the sources of p- 1, p-4 and p-5 
connected to power, with the drain of p-1 connected to both sources of p-2 and p- 

3, with the gate of p- 1 as the clock input, CLOCK, with the drain of p-2 as 
OUTPUT, with the drain of p-3 as OUTPUTBAR, with the drain of p-4 and the 
gate of p-5 connected to OUTPUT and with the drain of p-5 and the gate of p-4 
connected to OUTPUTBAR; 

b. five n-transistors, n-1, n-2, n-3, n-4 and n-5, with both sources of n-2 and n-3 
grounded, with the drain of n-2 connected to both the source of n-4 and the drain 
of n- 1, with the drain of n-3 connected to both sources of n-5 and n-1, with the 
drain of n-4 connected to OUTPUT, with the drain of n-5 connected to 
OUTPUTBAR, with the gate of n- 1 connected to CLOCK, with both the gates of 
n-2 and p-2 as INPUTBAR and with both the gates of n-3 and p-3 as INPUT; 
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22. A merged-type ss-pdd-cb-1 arrangement, mss-pdd-cb-1 replacing ss-pdd-cb-1 in 
the circuit segment claimed in Claim 12, 15 and 20, characterised in that it: 

a. uses the original ss-pdd-cb-1 claimed in Claim 18; 

b. maintains the INPUT, INPUTBAR, OUTPUT, OUTPUTBAR and CLOCK; 

c. removes the power-connected p-transistors in both stages; 

d. conneas the source of the remaining p-transistor of stage 1 to INPUTBAR and the 
source of the remaining p-transistor of stage 2 to INPUT; 
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(a) From a precharged p-block. 




(b) From a static p-block. 



Fig. 2 




Fig. 3 
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(a) During latching. (b) During unlatching. 

Fig. 6 




H- 



i — oil 



1 — 5_ L 

L mm — mm — — — 

(Clock from high to low) 
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Fig. 10 
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Fig. 14 




Fig. 15 Fig. 16 
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(a) After a precharged p-block. (b) After a modified non-precharged p-block. 
Fig. 17 
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(a) Critical stages in a double-pipeline. 
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(b) A high speed double-pipeline. 



Fig. 18 
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Fig. 20 Fig. 21 
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(b) N-i arches 
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Fig. 24 




(a) Separated type. (b) Merged type in a pipeline. 

Fig. 25 




(a) Dual-rail type. 
Fig. 26 
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(b) Single-rail type. 
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p-latch n-latch 
Important note: risky in cascading the two latches. 
Fig. 27 




(a) Separated type. (b) Merged type in a pipeline 

Fig. 28 




STC-2 p-latch STC-1 n-latch 
(a) Dual-rail positive-edge triggered. 



In- 



STC-2 p-latch STC-1 n-latch 
(b)Single-rail positive-edge triggered 
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(c) Negative-edge triggered section in a piperline. 
Fig. 29 




(a) Confliction-free version, j o 




(b) Simplified version. 
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Fig. 31 Fig. 32 




Fig. 33 




(a) Separate-stage type. (b) Merged-stage type. 

Fig. 34 
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(a) With an unmodified p-latch. (b) With a modified p-latch. 

Fig. 35 




(a) Dual-rail positive-edge triggered. (b) Single-rail positive-edge triggered. 




(c) Negative-edge triggered section in a pipeline. 
Fig. 36 




(a) Low-clock idlable. (b) High-clock idlable. 

Fig. 37 
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